perf: Optimize Utf8View string concat#21535
Merged
comphead merged 4 commits intoapache:mainfrom Apr 10, 2026
Merged
Conversation
Contributor
Author
Thanks for the review! I have a bunch of PRs out that all follow a similar pattern, replacing per-row NULL handling with bulk NULL bitmap operations:
etc. |
comphead
reviewed
Apr 10, 2026
| buffer.clear(); | ||
| write!(&mut buffer, "{left}{right}") | ||
| .expect("writing into string buffer failed"); | ||
| buffer.push_str(l); |
comphead
reviewed
Apr 10, 2026
| let nulls = NullBuffer::union(left.nulls(), right.nulls()); | ||
|
|
||
| for i in 0..left.len() { | ||
| if nulls.as_ref().is_some_and(|n| n.is_null(i)) { |
Contributor
There was a problem hiding this comment.
do we really need to check nulls.as_ref().is_some in loop?
Contributor
Author
There was a problem hiding this comment.
We could hoist this outside the loop, although
- This is a very common pattern in the code base
- I'd think the branch predictor should be able to handle this very effectively
- We'd need to duplicate the loop body to handle the two cases, no?
I'd say the current approach is okay but lmk if you disagree.
Contributor
There was a problem hiding this comment.
Just checked with godbolt and and yeah, looks like compiler rewrote the thing correctly so it identifies is_some outside of the loop 😮
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Optimize
||forUtf8Viewvalues in two ways:Options for every row, to check for NULL values. It is faster to precompute the NULL bitmap and then just check a single bitmap. Annoyingly,StringViewBuilderstill requires constructing the result NULL bitmap itself incrementally, but fixing that would be a more invasive change.write!({l}, {r}), which goes through thefmtmachinery and is very slow. It is more efficient to justwrite_strtwice.Benchmarks (Arm64):
What changes are included in this PR?
Are these changes tested?
Yes.
Are there any user-facing changes?
No.